Clustering Large-Scale Data Based On Modified Affinity Propagation Algorithm

نویسندگان

  • Ahmed M. Serdah
  • Wesam M. Ashour
چکیده

Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering algorithm is performed on all local cluster exemplars to select well-suited global exemplars of the whole data set. Finally, all the data points are clustered by the similarity between all global exemplars and each data point. Results show that the proposed clustering method can significantly reduce the clustering time and produce better clustering result in a way that is more effective and accurate than AP, KAP, and HAP algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partition Affinity Propagation for Clustering Large Scale of Data in Digital Library

Data clustering is very useful in helping users visit the large scale of data in digit library. In this paper, we present an improved algorithm for clustering large scale of data set with dense relationship based on Affinity Propagation. First, the input data are divided into several groups and Affinity Propagation is applied to them respectively. Results from first step are grouped together in...

متن کامل

A New Knowledge-Based System for Diagnosis of Breast Cancer by a combination of the Affinity Propagation and Firefly Algorithms

Breast cancer has become a widespread disease around the world in young women. Expert systems, developed by data mining techniques, are valuable tools in diagnosis of breast cancer and can help physicians for decision making process. This paper presents a new hybrid data mining approach to classify two groups of breast cancer patients (malignant and benign). The proposed approach, AP-AMBFA, con...

متن کامل

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

BACKGROUNDS Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, ...

متن کامل

Local and global approaches of affinity propagation clustering for large scale data

Recently a new clustering algorithm called ‘affinity propagation’ (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approac...

متن کامل

Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers

Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering  in which there is no need to  be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Artif. Intell. Soft Comput. Res.

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2016